Semantic Relatedness Estimation using the Layout Information of Wikipedia Articles

نویسندگان

  • Patrick Chan
  • Yoshinori Hijikata
  • Toshiya Kuramochi
  • Shogo Nishida
چکیده

Computing the semantic relatedness between two words or phrases is an important problem in fields such as information retrieval and natural language processing. Explicit Semantic Analysis (ESA), a state-of-the-art approach to solve the problem uses word frequency to estimate relevance. Therefore, the relevance of words with low frequency cannot always be well estimated. To improve the relevance estimate of low-frequency words and concepts, the authors apply regression to word frequency, its location in an article, and its text style to calculate the relevance. The relevance value is subsequently used to compute semantic relatedness. Empirical evaluation shows that, for low-frequency words, the authors’ method achieves better estimate of semantic relatedness over ESA. Furthermore, when all words of the dataset are considered, the combination of the authors’ proposed method and the conventional approach outperforms the conventional approach alone. Semantic Relatedness Estimation using the Layout Information of Wikipedia Articles

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

Measuring Semantic Relatedness using Wikipedia Signed Network

Identifying the semantic relatedness of two words is an important task for the information retrieval, natural language processing, and text mining. However, due to the diversity of meaning for a word, the semantic relatedness of two words is still hard to precisely evaluate under the limited corpora. Nowadays, Wikipedia is now a huge and wiki-based encyclopedia on the internet that has become a...

متن کامل

Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering

A graph-based distance between Wikipedia articles is defined using a random walk model, which estimates visiting probability (VP) between articles using two types of links: hyperlinks and lexical similarity relations. The VP to and from a set of articles is then computed, and approximations are proposed to make tractable the computation of semantic relatedness between every two texts in a large...

متن کامل

Non-Orthogonal Explicit Semantic Analysis

Explicit Semantic Analysis (ESA) utilizes the Wikipedia knowledge base to represent the semantics of a word by a vector where every dimension refers to an explicitly defined concept like a Wikipedia article. ESA inherently assumes that Wikipedia concepts are orthogonal to each other, therefore, it considers that two words are related only if they co-occur in the same articles. However, two word...

متن کامل

Exploring ESA to Improve Word Relatedness

Explicit Semantic Analysis (ESA) is an approach to calculate the semantic relatedness between two words or natural language texts with the help of concepts grounded in human cognition. ESA usage has received much attention in the field of natural language processing, information retrieval and text analysis, however, performance of the approach depends on several parameters that are included in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJCINI

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2013